pacman::p_load(outbreaker2, ape, igraph, vegan, distcrete, tidyverse, scales)
pacman::p_load_gh("CyGei/o2ools")
devtools::load_all()Assess the performance of the test
Type I error (False Positive)
A Type I error, or false positive, occurs when the null hypothesis is rejected despite being true. In this context, it means concluding that two (or more) posterior chains differ when they are, in fact, identical.
We run outbreaker2 to produce a single posterior chain. We then create two (or more) bootstrapped samples by sampling with replacement from the same posterior chain and apply a statistical test to compare them. The test should yield a non-significant result (p-value > 0.05) to correctly suggest no difference.
Running outbreaker2
set.seed(111)
x <- outbreaker(
data = outbreaker_data(
dna = fake_outbreak$dna,
dates = fake_outbreak$sample,
ctd = fake_outbreak$ctd,
w_dens = fake_outbreak$w,
ids = as.character(1:30)
),
config = create_config(
init_pi = 1,
move_pi = FALSE,
find_import = FALSE
)
)
x <- x[x$step > 1000, ] # burnin
chain <- o2ools::get_trees(out = x, ids = as.character(1:30))
#Function to remove the introduction in each tree of the chain (e.g. NA->1)
prep_chain <- function(chain) {
clean <- lapply(chain, function(x) {
total_nas <- sum(is.na(x))
from_nas <- sum(is.na(x$from))
if (total_nas == 1 && from_nas == 1) {
x <- x[!is.na(x$from), ]
x[] <- lapply(x, as.character)
return(x)
} else {
stop("DataFrame must have exactly one NA, and it must be in the 'from' column")
}
})
return(clean)
}
chain <- prep_chain(chain)
invisible(lapply(chain, check_tree)) # checks that the trees are in the right formatChi-square test
The chi-square test is performed 1000 times, with new samples drawn for each iteration. Under the null hypothesis, we expect around 5% of the tests to yield a p-value < 0.05 due to random chance.
For example:
set.seed(111)
lambda <- 20
t <- replicate(1000, {
x <- rpois(n = 10, lambda)
y <- rpois(n = 10, lambda)
chisq.test(cbind(x, y))$p.value
})
mean(t < 0.05) # Should be close to 0.05[1] 0.052
hist(t) # Should look roughly uniformNote that when \(\lambda\) is small (e.g. when a cell in the contingency table has a value <5), Fisher’s exact test should be used instead. This is particularly relevant when comparing posterior chains, as it is possible for a pair, such as A->B, to appear a few times in one chain but never (0) in the other.
To assess the sensitivity of the test to sample size, we vary the number of trees sampled from the posterior distribution (chain) in each replication.
sample_sizes <- c(10, 50, 100, 500)
set.seed(111)
results <- lapply(sample_sizes, function(n) {
test <- replicate(100, {
get_chisq(
sample(chain, size = n, replace = TRUE),
sample(chain, size = n, replace = TRUE)
#test_args = list(simulate.p.value = TRUE, B = 1000)
)$p.value
})
})
lapply(results, function(p_vals){
mean(p_vals < 0.05) # Shlould be close to 0.05
})[[1]]
[1] 0
[[2]]
[1] 0
[[3]]
[1] 0
[[4]]
[1] 0
We don’t observe the expected 5% of false positives likely because the chi-square test is used with low frequencies. Fisher’s test is computationally intensive, so we won’t try it out here but you can specify method = "fisher" in the function epitree::get_chisq().
PERMANOVA
Same methodology, using vegan::adonis2(). However, we only use 1 replicate as it takes a long time to run
sample_sizes <- c(10, 50, 100, 500)
results <- lapply(sample_sizes, function(n) {
test <- replicate(1, {
compare_chains(sample(chain, size = n, replace = TRUE),
sample(chain, size = n, replace = TRUE))$`Pr(>F)`[1]
})
})
results[[1]]
[1] 0.625
[[2]]
[1] 0.268
[[3]]
[1] 0.477
[[4]]
[1] 0.363
Type II error (False Negative)
A Type II error, or false negative, occurs when the null hypothesis is accepted despite being false. In this context, it means concluding that two (or more) posterior chains are identical when they are, in fact, different.
To assess a Type II error, we require at least two distinct outbreak scenarios. For example, one scenario involving super-spreading (heterogeneous transmission) and another with no over-dispersion (homogeneous transmission) where all individuals have the same reproduction number \(R\)).
The package simulacr will help us simulate such outbreaks.
pacman::p_load_gh("CyGei/simulacr")Super-spreading outbreak chain_ss
True transmission tree
We specify an offspring distribution using rnbinom(n = 100, size = 0.2, mu = 3), where the small dispersion parameter (\(k\) or size), indicates high overdispersion. This results in heterogeneous transmission, with substantial variation in the number of secondary cases generated by each infector.
set.seed(111)
sim_ss <-
simulacr::simulate_outbreak(
duration = 100,
population_size = 20,
R_values = rnbinom(n = 100, size = 0.2, mu = 3), #size is k (dispersion parameter)
dist_incubation = fake_outbreak$w,
dist_generation_time = fake_outbreak$w
)$data
epic_ss <- as_epicontacts(sim_ss)
plot(epic_ss)Outbreak reconstruction
Let’s use sim_ss’s data as input for outbreaker2. The resulting output we’ll use is called chain_ss.
out_ss <- outbreaker(
data = outbreaker_data(
dates = sim_ss$date_onset,
ctd = epic_ss$contacts[-1, ],
w_dens = fake_outbreak$w,
ids = as.character(epic_ss$linelist$id)
),
config = create_config(
init_pi = 1,
move_pi = FALSE,
find_import = FALSE
)
)
plot(out_ss)out_ss <- out_ss[out_ss$step > 1000, ] # burnin
plot(out_ss, type = "network")chain <- get_trees(out = out_ss, ids = as.character(epic_ss$linelist$id))
chain_ss <- prep_chain(chain)
invisible(lapply(chain_ss, check_tree))Check that outbreaker2 did a good job by computing the proportion of correctly assigned ancestries.
o2ools::get_accuracy(chain_ss, epic_ss$contacts[-1, ]) %>%
hist(
main = "Accuracy of outbreak reconstruction",
xlab = "% of correctly inferred ancestries",
ylab = "Count"
)“Uniform” outbreak
True transmission tree
Here we provide the same reproduction number \(R\) for everyone. For fair comparison, I ran a separate script to find a random seed that yields the same number of cases as out_ss and we’ll assign randomly the case IDs from out_ss to the below. The resulting output we’ll use is called chain_uni.
set.seed(3961)
sim_uni <-
simulacr::simulate_outbreak(
duration = 100,
population_size = 20,
R_values = 1L,
dist_incubation = fake_outbreak$w,
dist_generation_time = fake_outbreak$w
)$data
epic_uni <- as_epicontacts(sim_uni)
plot(epic_uni)@Thibaut R_values is not respected (some cases have R=2). Is this normal?
Outbreak reconstruction
Let’s use sim_uni’s data as input for outbreaker2.
out_uni <- outbreaker(
data = outbreaker_data(
dates = sim_uni$date_onset,
ctd = epic_uni$contacts[-1, ],
w_dens = fake_outbreak$w,
ids = as.character(epic_uni$linelist$id)
),
config = create_config(
init_pi = 1,
move_pi = FALSE,
find_import = FALSE
)
)
plot(out_uni)out_uni <- out_uni[out_uni$step > 1000, ] # burnin
plot(out_uni, type = "network")# assign the IDs from sim_ss
common_ids <- sample(as.character(epic_ss$linelist$id))
chain <- get_trees(out = out_uni, ids = common_ids)
chain_uni <- prep_chain(chain)
invisible(lapply(chain_uni, check_tree))Comparing two distinct posterior chains
chain_ss refer to the superspreading outbreak and chain_uni to the “uniform” outbreak.
Chi-square test
sample_sizes <- c(10, 50, 100, 500)
set.seed(111)
results <- lapply(sample_sizes, function(n) {
test <- replicate(100, {
get_chisq(
sample(chain_ss, size = n, replace = TRUE),
sample(chain_uni, size = n, replace = TRUE)
#test_args = list(simulate.p.value = TRUE, B = 1000)
)$p.value
})
return(mean(test < 0.05))
})
names(results) <- paste("Sample Size:", sample_sizes)
results$`Sample Size: 10`
[1] 1
$`Sample Size: 50`
[1] 1
$`Sample Size: 100`
[1] 1
$`Sample Size: 500`
[1] 1
PERMANOVA
We only replicate the test once due to running time.
sample_sizes <- c(10, 50, 100, 500)
system.time({
results <- lapply(sample_sizes, function(n) {
test <- replicate(1, {
compare_chains(
sample(chain_ss, size = n, replace = TRUE),
sample(chain_uni, size = n, replace = TRUE)
)$`Pr(>F)`[1]
})
})
names(results) <- paste("Sample Size:", sample_sizes)
}) user system elapsed
22.86 1.92 24.79
results$`Sample Size: 10`
[1] 0.001
$`Sample Size: 50`
[1] 0.001
$`Sample Size: 100`
[1] 0.001
$`Sample Size: 500`
[1] 0.001
Sensitivity Analysis
We now want to perform a sensitivity analysis on two posterior chains of transmission trees (chain_A, chain_B) to investigate how the sample size (sample_sizes) and the proportion of overlap between the two chains (overlap_freqs) affect the performance of the test (compare_chains). See run_analysis.R
source(here::here("analysis/performance/R", "run_analysis.R"))
print(run_analysis)It first generates a grid of all possible combinations of
sample_sizesandoverlap_freqs.For each combination of parameters:
Two samples are created:
reference: A sample of sizenfromchainAonly.mixed: A sample of sizencombiningchainAandchainB. It draws fromchainAwith probabilityoverlap_freqand draws fromchainBwith probability 1-overlap_freq.
The test compares
referencewithmixedand returns a p-value.This step is repeated
n_repeatstimes.
The function returns a 3D array:
- The rows represent the iteration id (
n_repeat). - The columns represent the
sample_size. - The 3rd dimension represent the overlapping frequency between the two chains (
overlap_freq). - The entries refer to the corresponding p-values.
- The rows represent the iteration id (
result <- run_analysis(
chainA = chain_uni,
chainB = chain_ss,
sample_sizes = c(50, 200, 1000),
overlap_freqs = seq(0, 1, 0.1), #generate_sequence(0, 1, 0.1)
n_repeats = 1000
)
# Took 51 hours to run!!!!! @ThibautLet’s read the results from the simulation.
run_date <- "2025-01-08"
result <- readRDS(here::here("analysis/performance/data", run_date, "/result.rds")) %>% reshape2::melt()
str(result)'data.frame': 33000 obs. of 4 variables:
$ n_repeat : int 1 2 3 4 5 6 7 8 9 10 ...
$ sample_size : int 50 50 50 50 50 50 50 50 50 50 ...
$ overlap_freq: num 0 0 0 0 0 0 0 0 0 0 ...
$ value : num 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 ...
The plot shows the distribution of p-values comparing two chains of posterior transmission trees over 1000 iterations.
With 50 trees per chain, the test detects a significant difference unless over 70% of the samples are from a shared distribution.
For chains with 200 trees, this threshold rises to 90%.
For 1000 trees, the test accepts the null hypothesis of no difference only when the chains are entirely drawn from the same posterior distribution.
Overall, as sample size increases, the test becomes more sensitive to differences between the chains, requiring a higher proportion of shared samples to accept the null hypothesis of no difference.
Alternatively, we can look at the frequency of p-values that are < and \(\ge\) to 0.05. Again, we expect on average 5% of false positives for two identical chains.
TO DISCUSS
We will be computing the following (note FPR = 1-sensitivity and FNR = 1-TPR) :
| Metric | Definition |
(chains are identical) |
(partial overlap) |
(chains are distinct) |
|---|---|---|---|---|
False Positive Rate (FPR) (type I error) |
Proportion of tests incorrectly rejecting the null hypothesis (detecting a difference when there is none). | Should be close to 5% (at α = 0.05) | Unclear | NA |
True Positive Rate (TPR) (sensitivity) |
Proportion of tests correctly rejecting the null hypothesis (detecting a difference when there is one). | NA | Unclear | Should be close to 100% |
True Negative Rate (TNR) (specificity) |
Proportion of tests correctly accepting the null hypothesis (detecting no difference when there is none). | Should be 95% | Unclear | NA |
False Negative Rate (FNR) (type II error) |
Proportion of tests incorrectly accepting the null hypothesis (detecting no difference when there is one). | NA | Unclear | Should be close to 0% |
| Accuracy | Proportion of correct results (true positives + true negatives). | Close to 95% | Unclear | Should be close to 100% |
| Optimal Sample Size / Power | Minimum n needed to achieve desired TPR |
The goal is to:
Compare test performance across different sample sizes
Evaluate how the “discrimination” ability changes with overlap_freq
Identify optimal significance level for balancing sensitivity and specificity
ROC
- Consider overlap_freq = 0 as our “positive” case (chains are truly different)
- Consider overlap_freq = 1 as our “negative” case (chains are identical)